Goto

Collaborating Authors

 proper drift


A probability theoretic approach to drifting data in continuous time domains

arXiv.org Machine Learning

December 5, 2019 Abstract The notion of drift refers to the phenomenon that the distribution, which is underlying the observed data, changes over time. Albeit many attempts were made to deal with drift, formal notions of drift are application-dependent and formulated in various degrees of abstraction and mathematical coherence. In this contribution, we provide a probability theoretical framework, that allows a formalization of drift in continuous time, which subsumes popular notions of drift. It gives rise to a new characterization of drift in terms of stochastic dependency between data and time. This particularly intuitive formalization enables us to design a new, efficient drift detection method. Further, it induces a technology, to decompose observed data into a drifting and a non-drifting part. Keywords: Online learning, learning theory, stochastic processes, learning with drift, continuous time models, drift decomposition 1 INTRODUCTION One fundamental assumption in classical machine learning is the fact that observed data are i.i.d. Yet, this assumption is often violated as soon as machine learning faces real world problems: models are subject to seasonal changes, changed demands of individual costumers, ageing of sensors, etc. In such settings, lifelong model adaptation rather than classical batch learning is required for optimum performance. Since drift, i.e. the fact that data is no longer identically distributed, is a major issue in many real-world applications of machine learning, many attempts were made to deal with this setting (Ditzler et al., 2015). Depending on the domain of data and application, the presence of drift is modelled in different ways. As an example, covariate shift refers to the situation of training and test set having different marginal distributions (Gretton et al., 2009). Learning for data streams extends this setting to an unlimited (but usually countable) stream of observed data, mostly in supervised learning scenarios (Gama et al., 2014). Learning technologies for such situations often rely on windowing techniques, and adapt the model based on the characteristics of the data in an observed time window. Active methods explicitly detect drift, usually referring to drift of the classification error, and trigger model adaptation this way, while passive methods continuously adjust the model (Ditzler et al., 2015).